2: 



Bayesian inference on dependence in 
multivariate longitudinal data 

Hongxia Yang* a , Fan Li 5 , Enrique F. Schisterman c , 
Sunni L. Mumford c and David Dunson 5 



a Watson Research Center (Yorktown), IBM, Statistical Analysis & Forecasting, Mathe- 
matical Sciences Department, NY, 10603 

^Department of Statistical Science, Duke University, Durham, NC 27708-0251 
c Eunice Kennedy Shriver National Institute of Child Health & Human Development, 
National Institutes of Health, Bethesda, MD 20892 



1 



Abstract 



In many applications, it is of interest to assess the dependence structure in 
multivariate longitudinal data. Discovering such dependence is challenging 
due to the dimensionality involved. By concatenating the random effects 
from component models for each response, dependence within and across 
longitudinal responses can be characterized through a large random effects 
covariance matrix. Motivated by the common problems in estimating this 
matrix, especially the off-diagonal elements, we propose a Bayesian ap- 
proach that relies on shrinkage priors for parameters in a modified Cholesky 
decomposition. Without adjustment, such priors and previous related ap- 
proaches are order-dependent and tend to shrink strongly toward an AR- 
type structure. We propose moment-matching (MM) priors to mitigate such 
problems. Efficient Gibbs samplers are developed for posterior computation. 
The methods are illustrated through simulated examples and are applied to 
a longitudinal epidemiologic study of hormones and oxidative stress. 

Key WORDS: Cholesky decomposition, covariance matrix, moment-matching, 
oxidative stress, random effects, shrinkage prior. 
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1 Introduction 



In biomedical applic ations, there i s incre asing interest in the analysis of multivariate lon- 
gitudinal data, with 



Fieuws et al 



(120071) providing a recent review of the literature in 



this area. When the dependence structure between different res ponses is not of inter 



est, on e can potentially use marginal models for each response. 



Gray and Brookmever 



(120001) use such an approach to combine inferences about a treatment effect, using gen- 
eralized estimating equations for model fitting. When the focus is instead on the time- 
varying relationship between the different longitudinal responses, one can use a multivari- 
ate random effects model, wh ich allows correlations between random ef fects in compo- 



nent models for each respon se (iShah et al. 



19971 : 



Chakrabortv et al. 



,3, 



2003, among others). 



Fieuws and Verbekd (120041) showed that the random effects approach to joint modeling 
can sometimes produce misleading results if the covariance structure is misspecified. 

A well known problem that arises in fitting a joint random effects model to multivari- 
ate longitudinal data is the presence of many unknown parameters in the random effects 
covariance matrix. This makes standard methods for fitting random effects models subject 
to convergence problems. Even when the covariance matrix can be estimated, the estimate 
tends to have a large variance and typical methods do not allow for inferences on whether 
off-diagonal elements of the random effects cov ariance are non-zero . These issues lead 
to difficulties in interpretation, which motivated 



Putter et al 



(120081) to develop a latent 



class modeling approach. In this article, we instead attempt to improve the performance 
of the joint random effects modeling approach through the use of a Bayesian method with 
carefully-chosen priors placed on the covariance matrix to favor sparsity. 

This article is motivated by data from the BioCycle study, which collected longitu- 
dinal measureme nts of markers of oxidat i ve str ess and reproductive hormones over the 



menstrual cycle (IWactawski-Wende et al. 



200% . The goal is to improve understanding 



of the dynamic relationship between these variables as this relationship has complicated 
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studie s in women of reproductiv e age with adverse health effects attributable to oxidative 



stress ( Schisterman et al 



2010). In this study, fertility monitors were used to time clinic 



visits and blood draws during two menstrual cycles from 259 women (IHowards et al. 



200% . Visits were scheduled within each cycle during (1) menstruation, (2) mid follic- 
ular phase, (3) late follicular phase, (4) luteinizing hormone (LH) /follicle stimulating 
hormone (FSH) surge, (5) ovulation, (6) early luteal phase, (7) mid luteal phase and (8) 
late luteal phase. Serum samples were assayed for hormone levels including estradiol 
(E2) and oxidative stress levels as measured by F2 Isoprostanes (F2Iso). In this paper, 
we focus on investigating the relationship between F2Iso, a biomarker of oxidative stress 
levels, and estradiol (E2). The BioCycle Study provides a unique opportunity to study de- 
pendence in hormone and oxidative stress trajectories. Hormonal patterns tend to follow 
patterns regulated by the hypothalamic -pituitary-ovarian axis, and are strongly correlated 
from cycle to cycle. 

Following common practice for m ultivariate longi t udinal data analysis, we initially 



consider a linear mixed effects model (ILaird and Ware , 



1982!) for each response. In par- 



ticular, let yhij denote the measurement of response type h for subject i at visit j, with 
h = 1 for log-transformed E2 and h = 2 for log-transformed F2Iso, and i = 1, . . . , n, 
j = l,...,m. Although our methods focus on the bivariate case, they apply directly to 
general multivariate longitudinal response data. We allow for unequal number and spac- 
ing of visits f or the different women, assuming the visits are missing at random (MAR) 



dRubin 



19761) . This assumption is deemed appropriate based on discussions with the 
study investigators, as it is unlikely that the missing scheduled visits were related to the 
F2Iso and E2 measurements on the day of the missed visit. Letting x^j and denote 
the p x 1 and q x 1 vectors of predictors, we assume 



Vhij 



^hijP + z 'hij h hi + e/jjj, h hi ~ N g (0, ft), e hij ~ N(0, a 2 ), (1) 



where (3 is a vector of unknown fixed effects parameters, is a vector of random effects 



and is assumed independent of the measurement error e^ij, is the q x q random effects 
covariance matrix that reflects the dependence structure within and across responses, and 
a 2 is the residual variance. 

The joint mixed effects model £T|) is flexible in allowing separate fixed and random 
effects for each response through the appropriate choice of x^y and z^./, while accom- 
modating dependence in the longitudinal trajectories through dependence in the random 
effects. Such dependence is measured by the off-diagonal elements in the random effects 
covariance matrix fi. In the BioCycle study, there is substantial variability in both F2Iso 
and E2 across the menstrual cycle as shown in Figured] Prior substantive knowledge sug- 
gests that the trajectories of F2Iso and E2 over the cycle may differ for different women, 
especially by menopausal status and body fat distribution. Although we expect the pat- 
terns to be more similar among women in the BioCycle study who were selected into the 
study because they were healthy and regularly menstruating, there still exists considerable 
variability. Hence, when studying certain populations it may not be reasonable a priori to 
assume a simple parametric model, such as a random intercept model. We instead assume 
separate fixed and random effect coefficients for each visit. This results in p = 9 (inter- 
cept and coefficients for the eight visits from each woman) and q = 16 (total number of 
responses if the woman attended all of her scheduled visits for the two cycles), for a total 
of 16 x 15/2 + 9 = 129 fixed and random effects parameters to be estimated from the 
data of only 259 women. 

In addition to the well-known problems of estimating a large number of parameters 
without regularization, frequentist fitting of linear mixed models with large numbers of 
random effects encounters computational problems in requiring many inversions of a large 
covariance matrix. The covariance matrix estimate is often ill-conditioned in such cases, 
with the ratio between the largest and smallest eigenvalues being large. This leads to 
amplification of numerical errors when the matrix is inverted, resulting in either a lack 
of convergence or apparent convergence to a poor estimate having substantial bias and 
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high variance. In fact, we first attempted to fit this model u sing a standard frequentist 



approach implemented in R 2.10.1 with the lme() function (IPinheiro and Bates. 



Lindstrom and Batesl 



1996; 



19821) . but failed to obtain convergence for the BioCycle data. 

Given these problems, and our interest in inferences on certain off-diagonal elements 
of the random effects covariance matrix ft, we instead adopt a Bayesian approach. The 



typical Baye sian approach to linear mixed effects models (e.g., 



Gilks 



Zeger and Karim , 



1991 



1993), either assumes a priori independence among the random effects or chooses 
an inverse-Wishart prior distribution for the random effects covariance structure. How- 
ever, since the inverse-Wishart prior incorporates only a single degree of freedom, it is 
not flexible enough as a shrinkage prior for a high-dimensional covariance matrix. One 
natural solution is to choose a prior that favors sparsity, shrinking most insignificant ele- 
ments of the covariance matrix to values close to zero. This can stabilize estimation and 
improve inferences on significant dynamic correlations. 

A variety of the shrinkage priors for ft have been proposed in the literature, achieving 
model flexibility while not sacrificing the positi ve definite constraint through the use of 



matrix decompositions. 



Daniels and Kassl ( 



wards a diagonal structure. 



3i 



19991) proposed p riors that favor shrinkage to- 



op 
li( 



Daniels and Pourahmadil (|2002l ) developed alternative priors 



based on a Cholesky de composition, giving advantages in interpretation and computation. 



Smith and Kohnl (12002!) proposed a parsimonious covariance estimation approach for lon- 



gitudinal data that avoids explicit specification of ran dom effects. Motivated by the prob 



lem of selecting random effects with zero variance, 



Chen and Dunson 



(2003) proposed 



a m odified Cholesky de composition that facilitates choice of co nditionally-conjuga t e pri 



ors. 



Pourahmadi 



Chen and Dunson 



(120031) 



(2007) demonstrated appealing properties of the 
decomposition in terms of separation of the variance and correlation parameters. 

However, we find that posterior computation for the previously proposed sparse shrink- 
age priors generally does not scale well as the number of random effects increases and 
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there are issues in overly-favoring shrinkage towards AR-type covariance structures. Mo- 
tivated by the multivariate longitudinal BioCyc le data, we propose a new class of heavy- 
tailed shrinkage priors on the parameters in the lChen and Dunsonl (12003b decomposition. 
These priors are robust and introduce substantial computational advantages. It is noted 
that shrinkage priors under Cholesky-type decomposition have computational advantages 
but induce order dependence and tend to over-shrink as the locations of the covariance 
matrix move further off the diagonal. To mitigate this problem, we propose moment- 
matching priors. Efficient Gibbs samplers are developed for posterior inferences under 
both priors. 

In Section 2, we describe the modified Cholesky decomposition of the covariance ma- 
trix and propose new shrinkage priors for the parameters in this decomposition. In Section 
3, we describe the order dependence phenomenon and propose the moment-matching pri- 
ors. Section 4 outlines a simple Gibbs sampling algorithm for posterior computation. 
Section 5 applies the methods to simulated datasets. Section 6 considers the application 
to the BioCycle study and Section 7 concludes with a discussion. 



2 Shrinkage Priors for Random Effects Covariance 
Matrices 



In order to carry out a Bayesian analysis of model CD, we adopt the m odified Cholesky 



decomposition of the covariance matrix ft by 



Chen and Dunsonl (120031) . 



I 



ft = AIT'A, 



(2) 



where A = diag(Ai, . . . , \ q ) is a diagonal matrix with A; > for I = 1, . . . , q, and T is 
a q x q unit lower triangular matrix with j m i in entry (m, I). The diagonal elements of A 
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and the lower triangular elements of T are vectorized as follows, 



A = (Al, . . . ,X q )', 7 = (721,731,732, • • • ,lq,q-2,lq,q-l)'- 

The elements of A are proportional to the standard deviations of the random effects. Set- 
ting A/ ps is effectively equivalent to excluding the Zth random effect from the model. 
By doing so, we move between models of different dimensions, while keeping the covari- 
ance matrix of the random effects in each of these models positive definite. The elements 
of 7 characterize the correlations between the random effects. 

Reparameterizing £T|) with the modified Cholesky decomposition, we have 



Vhij 



x.' hij (3 + z' hij ATa hi + e hij , a hi ~ N ? (0, I q ), e hij ~ N(0, a 2 ), (3) 



where I q denotes a q x q identity matrix. Following IChen and Dunsonl (120031) . we define 
two vectors 

Uhij = (ahilXmZhijm)' , thij = I Zhijl{a,hil + ^ ahimlml) \ , 1 < I < m < q. 

^ m=l ' 

Then © can be rewritten as, 



Uhij - X-'hijfl -'^2 a hil^lZhijl = u'hijf + € hij , 



1=1 



Vhij—^hiiP — t'hijX + e hij- 



(4) 

(5) 



Therefore prior distributions for Q can be induced through priors on A, 7 and all the 
model parameters can be updated as in the normal linear regression. 

We first introduce the priors for the fixed effects (covariates) coefficients (3. When 
the number of covariates is large, subset-selection is often desirable. In the Bayesian 
literature, this is usually achieved by introducing a latent variable J\ G {0, 1} for each 
covariate that indicates wheth er it is included in the model, and assuming a spike and slab 



prior for /?/ conditional on J/ (IGeorge and McCulloch , 



1997 



Smith and Kohn , 



1996). Let 



f3 J = {Pi : Ji = 1} be the set of coefficients of the selected fixed effects and X J be 
the corresponding covariates matrix. We assume a standard i.i.d. Bernoulli prior for Jf. 
Ji ~ Bernoulli (po), and express prior ignorance by setting pq ~ beta(a p , b p ). Then, for 
each of the Pi's with Ji = 0, we assume the prior to be a point mass at 0; and for f3 J , we 



assume a Zellner g-prior (IZellner and Siow 



1980), 



/3 J ~N(0,c7 2 (X J X J r7 5 ), s~G(l/2,JV/2), 

where a 2 follows a Jeffrey's prior a 2 oc l/a 2 is the same a 2 in model (Q]), N = J2 n i an ^ 
G(a, b) denotes a Gamma distribution with mean a/b and variance a/b 2 . 

For A, we consider another point mass mixture prior similar to that of the /3's, allow- 
ing for random effect selection. Specifically, we assume an i.i.d. zero-inflated half-normal 
distribution for A; (7 = l,...,q), 

Ajl^-p^o + a-pON+CO,^), $ ~ IG(l/2, 1/2), (6) 

where 6q is a point mass at and N + (0, <j>f) is the normal distribution N(0, 4>f) truncated 
to its positive support. When A; > for all I, the decomposition in Q guarantees that £1 
is positive definite and A and T are identifiable. When A^ = 0, elements of the resulting 
ft in the Zth row and Zth column are 0. The submatrix of ft formed by removing the Ith 
row and Ith column will still be positive definite. Therefore we are able to move between 
models with different dimensions by removing these rows and columns while still keeping 
the covariance matrix of the random effects of all these models positive definite. The 
hyperparameter p\ represents the prior probability of A; = and is set to be 0.5 to express 
prior ignorance. The induced marginal prior for A^ from © is a mixture of a heavy-tailed 
truncated Cauchy distribution and a point mass at zero. 

The parameters of primary interest in this study are the correlations of the random 
effects, which depend on 7. Without restriction, the large number of unknown parameters 
in 7 relative to the sample size can lead to difficulty in model fitting. We thus consider 



the fo llowing Normal-Exponential-Gamma (NEG) shrinkage prior (IGriffin and Brown , 



20071): 

TmzlVw ~N(0,<7 2 </w), i> ml ~Exp(5 2 /2), <5 2 ~ G(c , do). (7) 

The hyperparameters (co, do) control the degree of model sparsity. A larger Co and/or a 
smaller do lead more coefficients to be close to zero. The prior has fatter tails and larger 
variance as do increases. We set Co = 1 to introduce more shrinkage and let do ~ G(l, 1) 
to make the priors more flexible. 



3 Moment Matching Prior 



As noted in IPourahmadil (120071) . a perceived order among the variables is central to the 
statistical interpretations of the entries of A and T as certain prediction variances and 
moving average coefficients. For longitudinal and functional data there is a natural time- 
order, while for others, the context may not suggest a natural order. The intrinsic order 
depe ndence in shrinkage prio rs based on Choleskey-type decompositi ons, including not 



only 



Chen and Dunson 



(12003h but also 



Daniels and Pourahmadil (|2002h . favors shrinkage 



towards an autoregressive-type covariance structure. Such methods can over shrink non- 
zero covariance not close to the diagonal. This motivated us to develop the following MM 
prior to mitigate such order dependence problems. 

Let 7f m n denote the mth and Ith row of the lower triangular matrix T and jUr m n denote 
the corresponding prior mean for 7r m n , 



l[ml] = (7ml) • • • )7m,m-l)7Jl> • • • >7M~l)> ^[ml] = (Mi • • • > Mm,m-lj Ml, 

1 < I < m < q. 



Also denote the correlation matrix corresponding to fi by p. 



Chen and Dunsonl (12003b 
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showed that p m i, the (m, Z)th entry of p, is determined solely by 7[ OT n as follows, 



Pmi = h(l[ml] ) = 7mt + 2, r=1 7tr7mr (g) 

/ (i + Efci7&)(n-E^ 1 7^) 



This property is crucial to the introduction of the MM prior. Our key idea is to pair-wisely 
match the first and second prior moments of p to those induced from the priors for 7's. 
The first order Taylor expansion of h{7\ m n) at the prior mean of 7r m /i gives, 

h{j[ m i]) ~ %M) + V%[ m j|)'('T[ B1 i]-/iH) l (9) 

where VM/W = (^fe^ • • • , • • • , Applying the ex- 

pectation operator with respect to the prior distribution of 7^ to (O, we have 

E(p mi ) = E{/ i ( 7M )} « = MmZ + E^Mw l < / < m < g . 



(10) 

Fixing the values of E(/o m j)'s and replacing the approximation by equation (flOl . we define 
a system of g(g — l)/2 equations for the prior means /xr m n's. Similarly, applying the 
variance operator to (O, we have 

Var(p m/ ) = Var{/t(7 [mi ])} « V/t(/x [m i])'* 7[mi] V/i(/i [m q), 1 < I < m < q, 

where ^ 7[m;] is the prior covariance matrix of 7[ m q, with the variance of 7 m fc denoted 
by ipmk and the covariance between 7^ and 7 m & denoted by ipij >m k- Rewriting the ma- 
trix product in the form of summations and replacing the approximation by the equation 
above, we have 

:m-l , /dfc(M[roi])\2 oV mi-l / g^O^mi]) dh{jj, [ml] ) 



Var(p, 



mZ J 



Z^fc=l Vmfcl, d/lmk ) + Z l^l<k<jVmk,mj-gJJ^- k ~ , IOr t - I, 

2 r 
(ii) 



Var( m ) + Var( Pml ) + 2 ^ ^T=i ^ ^ ^fc'* » for 2 < / < m < 
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where 

dml[m(l + YT=l Vmr) - (Mmi + Er=l Mr Vmr) Hmk] , for k<l< 



dp 



mk 



dmft[(l + I]"=1 1 ~ Vmk(Pmk + Sr=l Wbr/w)] , for k = l< 

— d m lPmk{Pml + Ylr=l Vlrlhnr), f° r I < k < 

with d mZ = (1+Er=\ M?r)" 1/3 ( 1 +E™"i 1 mL)~ 3/2 - When Var(p m/ )'s andE(p mZ )'s are 
pre-fixed, (TTTT t defines a system of g(g — 1) /2 equations for the prior covariances ^ . 's. 

Lacking prior information on the random effects, it is reasonable to assume that all 
elements of the correlation matrix p have equal mean and variance a priori, leaving the 
data to adjust for the real correlations. If we assume a common prior mean u and variance 
v for p's, then (u, v) should be in the range u € [—1, 1] and u ± 3y/v S [—1, 1] to satisfy 
the condition p m i G [—1,1]. Solving (fTOl ). we have, 



/ l + (m-2)it _ / 1-u 

Mmi-uy (l + (m _ 1)u)(1 _ u) . AW-Mmiy (i + (i_ 2 )u)(l + (i-l)u)- 

(12) 

The system of equations (fTTI ). however, is in general under-identified because the num- 
ber of unknowns is larger than the number of equations. Unde r reasonable s implif ying 



Pourahmadil (120071) . we 



assumptions motivated by the form of ([8]) and interpretations of '. 
assume that j m i, . . . , 7 m , m _i are independent of each other, the 7 m ;S have common vari- 
ance, and the correlations between 7y and 7 m & (I ^ m) are equal. Thus, the number of 
unknowns and equations become the same and unique solutions for ip m \ and ip m 2 can be 
written with the above assumptions for all m by, 

= g ( ^w 2 

fcl 

^ m2 = _, /(2 gg^^w } . (13) 

Given the means and variances of p, we can calculate the corresponding means and vari- 
ances of 7 m / 's through (fT2l and ([T3T ). To test the effectiveness of the transformation, we 
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can generate 1000 times and obtain the corresponding estimated prior distributions 
of p through ©. To set values for (u, v), we want to both shrink nonsignificant values as 
much as possible by setting u close to zero and leave out significant values by setting v as 
large as possible but within the constraint that u £ [—1,1] and u ± 3y/v £ [—1,1]. With 
the above two criteria, to test the effectiveness of the MM priors, we experiment with 
different values of (u,v), (0.05,0.1), (0.1,0.09), (0.15,0.08), (0.2,0.07) with different 
dimensions. In all the experiments, order-dependence is clearly avoided as the entries of 
p move further off the diagonal. The prior distributions are still approximately N(u, v). 
We notice that with u = 0.05 or 0.1, the resulting elements of the estimated p have rela- 
tively larger ranges, while as u increases, the range decreases. To achieve more flexibility, 
we can set weakly-informative priors for u and v as u ~ ~N{hq,(j 2 )1{[j, e [ — 1> M) an( ^ 
v ~ IG(co, do)l(u ± ?>y/v £ [—1, 1]) respectively. The corresponding priors for p and 
^ can then be calculated from model (fT2l) and (fT3T > and some Jacobian computation is 
needed. 



4 Posterior Inferences 



The posterior distribution is obtained by combining priors and the likelihood in the usual 
way. However, direct evaluation of the posterior distribution seems to be difficult. The 
joint posterior distribution for 6 = (/3, A, 7, a 2 ) in model (3) is given by, 



p(0\y) oc 



H N g (a i; 0, l q ) [] { J] N(y hij ; x' hij P + z'^Ara,^ 2 )} 

h j=i 



i=l 



p(a 2 )p((3,J,g)p(X, 7 ),(U) 



which has a compl ex form that makes direct sampling infeasible. Therefore we employ 



the Gibbs sampler (iGelfand and Smithl . 



19900 by iteratively sampling from the full condi- 



tional distributions of each parameter given the other parameters. The details of our Gibbs 
sampler is given below: 
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1. Sampling fixed effects parameter (3 through, 



1 „J „J> \-i 



where Egj = jJjCEh.ij 



with 
Jj = 1}- 



and /igj = S^j Eh,<j ^hij^hij, 



Uj = Vhij - z' hij ATa.i and cc^ denoting the subvector of x hij , {x hij i : 



2. Sampling g through the following conjugate Gamma distribution, 

9 ~ G 



where PJ = £? =1 1(^1 = 1) and N = £ 4 



■m. 



3. Updating J\ individually, following results from Smith and Kohn (1996), we have 

1 



with hi 



pi 



p(Ji = 1| ■ • • ) = 
+ i) f {^}^and5(J) 



1-Pin i i\i fS(Ji=i)i" „„a Gf n — ^ L^'y^y^'yA-Iy./', 



1+9 



4. Sampling A; individually from a inflated half-normal distribution with Chij = Vhij — 

p{M • • • ) = Zl-N + (p h X h af) 



with pi 



Pi 



P ' +t P ' j iV(0;A>2) i-*(0;0,D 
t 2 



5. Updating 7 through the following two circumstances, 



i. If the prior is as described in |7]), following 



Park and Casellal (120081) . we can 



use blocked Gibbs sampler to update 7's and their concentration parameters 
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as following, 




m,l 



with fij = S 7 J2 h ,i ,j ^u hij w hlj and S 7 = (Y,h,i 
is a diagonal matrix with diagonal elements of ip m i. 




ii. If the prior is the MM prior, 7 is updated by, 



p( 7 |...)~N(7,l2) 



where R = (a 2 ^2 hij u hij u T hij +^> x ) 1 and 7 = R{o 2 £ Mj u hij (y hij - 
x hijft J ) + v I / ~ 1 /^}- H and \I/ are obtained from the MM priors described in 
Section 3 and can be updated through the random walk Metropolis-Hastings 
method if hyperpriors u and v are not fixed. 

6. Sampling random effects from 



with /x aw = S aw J2j ^'A{ hij and £ aw = (£\ ^T'Az hij z' hij AT + 1,)" 1 . 
7. Sampling a 2 with 9 hij = y hij - x^/3 J - z' hij ATa hi by, 



After discarding the draws from the burn-in period, we can estimate posterior summaries 
of the model parameters in the usual way from the Gibbs sampler output. 



p(*hi\ ■■■) 



• • • ) ~ Inverse-Gamma(iV/2, ^ 6^/2) 
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5 Simulations 



In this section, we examine the performance of the proposed priors on simulated data. 
Since our primary interest is on the covariance matrix of random effects, we assume there 
are no fixed effects in the simulations. Data are generated from model ([3]) with z^j = l g . 
Six representative structures (Figure |2]) are considered. 

1. The identity structure: ft is the identity matrix so all random effects are indepen- 
dent. 

2. The tri-diagonal structure: Q has unity diagonal entries with the immediate off- 
diagonal entries being -0.488, corresponding to the covariance matrix of a MA(1) 
model with decay parameter 0.8. The remaining entries are zero. 

3. The circulant structure: similar to the tri-diagonal structure except for an additional 
pair of entries at (1, q) and (q, 1) being set to 0.4. 

4. The block diagonal structure: fi has six blocks, each viewed as a separate covari- 
ance matrix with the entries decreasing from unity at the rate of 0.8 as a function 
of the distance from the diagonal (i.e., the immediate off-diagonal entries are 0.8 
and the next off- diagonals are 0.8 2 , and so on). This resembles the situation where 
the variables are divided into several independent groups and variables within the 
same group are closely connected. 

5. The random structure: Q has the diagonals being unity, the immediate off-diagonal 
entries being 0.4, and some other entries having randomly selected values. We also 
experiment with other values for the immediate off-diagonal entries. This structure 
is similar to that in our application, where the data are longitudinal but can have 
significant points further off the diagonal entries. 
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6. The full structure: similar to the block diagonal structure, but all variables are now 
in the same group. Entries decay at the rate of 0.8 as they swing away from the 
main diagonals, resembling an AR(1) structure. 



For each structure, we simulate a data set with 200 subjects, each having 15 visits and 2 
outcomes per visit. In total, there are 30 random effects for each subject, i.e., q = 2 x 15 = 
30. 

We first try to estimate the model with functions from the R package nlme, which can 
fit and compare Gaussian linear and nonlinear mixed-effects models. We can only get esti- 
mation when the covariance matrix is diagonal, while all the others fail with an error mes- 
sage "iteration limit reached without convergence". It seems that the package nlme can 
only deal with small dimensional data, e.g., q is small. We then try the R package corpcor 
for comparison with our proposed methods. This package implements a James-Stein-type 
shrinkage estimator for the covariance matrix, with se parate shrinkage for variances and 



correlations. The details of the met hod are explained in 



Schafer and Strimmeri (120051) and 



Opgen-Rhein and S trimmer! (|2007T) . In order to compare the estimated covariance matrix 
with different methods (the R package corpcor cannot output the covariance matrix for 
the random effects in the linear mixed effects model), we assume that the residual variance 
is zero when generating the data. Results with the shrinkage and the MM priors are based 
on a Gibbs sampler of 20,000 iterations after a burn-in period of 10,000. Estimations are 
compared based on the squared error loss function, 

^(«^) = ^{EE(^-^) 2 } 1/2 - 

i 3 

where Wjj is in the ith row, jth column of f2 and & is in the ith row, jth column of f2. 
Figure [3] shows the squared error losses of the estimates from the corpcor, the shrinkage 
priors and the MM priors. For simplicity, we set fixed values for (u,v) as (0.05,0.1), 
(0.1,0.09), (0.15,0.08) and (0.2,0.07). Both the shrinkage priors and the MM priors 
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outperform the estimation from the corpcor, except under the diagonal covariance ma- 
trix structure. The corpcor performs best when the true underlying covariance matrix is 
sparse but otherwise tends to over-shrink. When the true underlying covariance structure 
is diagonal, the shrinkage priors and the MM priors perform equally well. The shrinkage 
priors have the smallest squared error losses when the underlying covariance structure is 
tri-diagonal, circulant, block-diagonal and full structure. The MM priors clearly outper- 
form the shrinkage priors when the true underlying covariance structure is random. As 
expected, the estimates of the shrinkage priors under the random structure tend to over- 
shrink the parameters as they move further off the diagonal. 

To further explore the impact of (u, v) values on the performance of the MM priors, 
we calculate the MSE for (u,v) values being (0.05,0.1), (0.1,0.09), (0.15,0.08), (0.2,0.07) 
under different random covariance structures. The difference in MSE from the MM pri- 
ors among the selected (u, v) values under the above simulation settings are very small. 
The immediate off-diagonal values are chosen from {0.2, 0.3, . . . , 0.8} and the randomly 
selected further-off diagonal values are the same in each test for comparison. MSEs are 
shown in Figure|4]and the MM priors perform best with (u, v) = (0.1, 0.09). Since (u, v) 
are hyperpriors for the correlation matrix, which will not be affected by the magnitude and 
scale of the new datasets, we adopt the value (0.1,0.09) in later analyses for simplicity. 
MSEs are smallest when the immediate off-diagonal value is 0.3 and get larger when the 
values get larger. 



6 Application to the BioCycle Study 

Oxygen free radicals have been implicated in spontaneous abortions, infertility in men 
and women, reduced birth weight, aging, and chronic disease processes, such as cardio- 
vascular disease and cancer. It is thought that estrogen may play an important role in 
oxidative stress levels in women. However, little is known about the relation between 
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oxidative stress, estrogen levels, and their influence on outcomes, such as likelihood of 
conception or spontaneous abortions. The primary goals of the BioCycle study are to 
better understand the intric ate relationship between hormone levels and oxidative stress 



during the menstrual cycle (ISchisterman et al 



201CT) . The BioCycle study enrolled 259 



healthy, regularly menstruating premenopausal women for two menstrual cycles. Par- 
ticipants visited the clinic up to 8 times per cycle, at which time blood and urine were 
collected. 

The BioCycle study provides a unique setting for application of the proposed method- 
ology. The data is longitudinal and hormone levels tend to follow predicted patterns across 
the menstrual cycle due to the complex feedback mechanisms which regulate hormonal 
levels through the hypothalamic -pituitary-ovary axis. Further, hormone levels during spe- 
cific phases tend to be correlated from cycle to cycle. 

The responses are transformed to a log scale to make the normal assumption more 
reliable and the predictors are standardized by subtracting the mean and dividing by the 
standard deviation. Responses of F2Iso and E2 from the first 20 subjects are shown in 
FigureQ] We can see certain common trends over visits across the women, but more strik- 
ingly each individual has her own diversity which makes the plots more variable. Linear 
mixed-effects models can accommodate such differences and analyze the longitudinal de- 
pendences among two types of responses varying over visits through the covariance ma- 
trix. Specifically, y^j is the response for type h(h = 1, 2) of subject i (i = 1, . . . , 259) at 
visitj(j = 1,..., 8 for the 8 visits). Letx U j = (1, ^n, a?ii2, x il3 , a^u, 0, 0, 0, 0)g xl and 
%2ij = (1,0,0, 0,0, Xi2i, Xi22, ^23,^124)9x1 be the fixed predictors of response h = 1 
and h = 2, respectively, for subject i at visit j. Let zuj = (0, . . . , luj ,...,0,0,..., 0)' 16x 1 

8x1 8x1 

and Z2ij = (0, . . . , 0, 0, . . . , 1 , • • • , 0) 16xl stand for the random predictors of response 

8x1 8x1 
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1 



hij 



h for subject i at visit j, where 

1, subject i showed up at visit j for response h, 
0, otherwise. 
We attempt to fit model (fl]) with the lme{) function in R 2. 10. 1 but failed, because the esti- 
mates do not converge. We first estimate the covariance structure with the shrinkage priors 
given the data collected longitudinally. In order to capture the possible sporadic significant 
signals, we also estimate model ( []]) with the MM priors with (u,v) = (0.1,0.09). The 



Raftery and Lewis diagnostic (IRaftery and Lewis , 



1995) is used to estimate the number 



of MCMC samples needed for a small Monte Carlo error in estimating the 95% credible 
intervals. The required sample size can be different for each parameter and 20,000 itera- 
tions are found to be enough for all parameters. Convergence diagnostics, such as trace 
plots and Geweke's convergence diagnostic for randomly selected off-diagonal elements 
of the covariance matrix are performed on some selected elements. No signs of adverse 
mixing is found. All results are based on 50,000 Gibbs sampling iterations after a burn-in 
period of 20,000. 

Figures |5]and|6]display the estimated correlation structures for both within and across 
responses for the shrinkage and the MM priors respectively. The left panels are the 
estimated correlation matrices between the two responses and the right panels are the 
zoomed-in cross correlation structures among the two responses. The left upper 8 by 8 
matrix (with the 8 responses from the cycles) is the correlation matrix for response E2 
across the cycle. The right lower 8 by 8 matrix is the correlation matrix for the eight 
F2Iso responses across the cycle. For example, the (2,3)rd cell is the correlation between 
the second visit and the third visit of response E2; the ( 10, 1 l)th cell is the correlation 
between the second visit and the third visit of response F2Iso. The upper right (or the 
lower left) 8 by 8 matrix is the cross correlation among responses of E2 and F2Iso across 
the two cycles. For example, the (l,9)th cell is the correlation between the first visit of E2 
and the first visit of F2Iso. 
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Estimated correlation structures through the MM priors help us have a better under- 
standing of the relation between estrogen levels and F2 Isoprostanes during the menstrual 
cycle: finding more visits with stronger correlations between the two responses. The anal- 
ysis shows that the correlations appear to differ slightly across the menstrual cycle, with 
the cross-correlations being low in general. Further, the 5th visit for F2Iso is much less 
correlated than the others. This could be due to the fact that the mean values of F2Iso 
tend to be lowest at this point during the cycle (around ovulation when estrogen levels 
are high), but otherwise are not varying as much at the other visits. Estimates from the 
shrinkage priors fail to pick up most of the stronger correlations between visits. 

7 Discussion 

This article has proposed two new methods for Bayesian model selection of fixed and 
random effects in continuous models. Our approaches rely on shrinkage priors and MM 
priors to the setting of variable selection of multivariate, correlated random effects with 
large dimension. Clear advantages over earlier approaches include robustness, efficiency 
of posterior computation and overcoming the order dependence problem. 

Our proposed approach is advantageous in that fixed and random effects are selected 
simultaneously. In particular, the prior and computational algorithm represent a useful al- 
ternative to approaches that rely on inverse-Wishart priors for variance components. There 
is an increasing realization that inverse-Wishart priors are a poor choice, particularly when 
limited prior information is available. Although we have focused on LMEs of the Laird 
and Ware (1982) type, it is straightforward to adapt our methods to a broader class of lin- 
ear mixed models, accommodating varying coefficient models, spatially correlated data, 
and other applications. 
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Figure 1 : The first 20 subjects in the data set. Responses LSE2 (logarithm of scaled E2) 
and LSF2I (logarithm of scaled F2Iso) are shown over visits 1-8. 




Figure 3: Squared Error Loss for Maximum Likelihood Estimation(blue line), Moment 
Matching Prior with (u,v) = (0.05, 0.1)(black line), (u,v) = (0.1, 0.09)(green line), 
(u, v) = (0.15, 0.08)(pink line), (u, v) = (0.2, 0.07)(brown line) and Shrinkage prior(red 
line) 
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Values for immediate off diagonal 

Figure 4: Squared Error Loss for (u,v) = (0.05, 0.1)(red line), (u,v) = 
(0.1,0.09)(black line), (u,v) = (0.15, 0.08)(blue line), (u,v) = (0.2, 0.07)(green line) 
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Figure 5 : Estimated covariance matrix between log(E2) and log(F2Iso) 
and the zoomed in cross covariance structure through Moment Match- 
ing prior 
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